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Abstract 

We propose a class of robust estimates for multivariate linear models. 
Based on the approach of MM estimation (Yohai 1987, [27]), we estimate 
the regression coefficients and the covariance matrix of the errors simul- 
taneously. These estimates have both high breakdown point and high 
asymptotic efficiency under Gaussian errors. We prove consistency and 
asymptotic normality assuming errors with an elliptical distribution. We 
describe an iterative algorithm for the numerical calculation of these esti- 
mates. The advantages of the proposed estimates over their competitors 
are demonstrated through both simulated and real data. 
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1 Introduction 



Consider a multivariate linear model (MLM) with random predictors, i.e., we 
observe n independent identically distributed (i.i.d.) {p + g)-dimensional vec- 
tors, Zi = (y-,x9 with 1 < i < n, where = {ya, . . . ,yig)' G M^, Xj = 
[xii, . . . ,Xip)' G MP and ' denotes the transpose. The are the response vectors 
and the Xj are the predictors and both satisfy the equation 

y, = BgXi + Ui l<i<n, (1.1) 

where Bq G W^'^ is the matrix of the regression parameters and Uj is a q- 
dimensional vector independent of Xj. If Xjp = 1 for all 1 < z < n, we obtain a 
regression model with intercept. 

We denote the distributions of Xj and Uj by Gq and Fo, respectively, and I]o 
is the covariance matrix of the Uj. The p- multivariate normal distribution with 
mean vector /j, and covariance matrix I] is denoted by Npln, S). 

In the case of Uj with distribution Nq{0, Sq), the maximum likelihood estimate 
(MLE) of Bq is the least squares estimate (LSE), and the MLE of Sq is the sample 
covariance matrix of the residuals. It is known that these estimates are not robust: 
a small fraction of outliers may have a large effect on their values. 

Several approaches have been proposed to deal with this problem. The first 
proposal of a robust estimate for the MLM was given Koenker and Portnoy [H]. 
They proposed to apply a regression M-estimator, based on a convex loss function, 
to each coordinate of the response vector. The problems with this estimate is lack 
of affine equivariance and zero breakdown point. Several other estimates without 
these problem were defined later. Rousseeuw et al. [22] proposed estimates for 
the MLM based on a robust estimate of the covariance matrix of z = (x',y'). 
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Bilodeau and Duchesne [S] extended the S-estimates introduced by Davies [7] for 
multivariate location and scatter; then Van Aelst and Willems [26] studied the 
robustness of these estimators. AguUo et al. [1] extended the minimum covariance 
determinant estimate introduced by Rousseeuw [21] and Roelandt et al. [20] 
extended the definition of GS-estimates introduced by Croux et al. [6]. These 
estimates have a high breakdown point but are not highly efficient when the errors 
are Gaussian and q is small. In order to solve this, AguUo et al. [1] improved 
the efficiency of their estimates, maintaining their high breakdown point, by 
considering one-step reweighting and one-step Newton-Raphson GM-estimates. 
Garcia Ben et al. [8] extended r-estimates for multivariate regression, obtaining 
a estimate with high breakdown point and a high Gaussian efficiency. Another 
important approach to obtain robust and efficient estimates is contrained M (CM) 
estimation, proposed by Mendes and Tyler [T7j for regression and by Kent and 
Tyler [TH] for multivariate location and scatter. The bias of CM estimates for 
regression was studied by Berrendero et al. [3]. Following this approach, Bai et 
al. [2] proposed CM estimates for the multivariate linear model. 

In this paper we propose robust estimates for the linear model based on the 
MM approach, first proposed by Yohai [27] for the univariate linear model, and 
later by Lopuhaa [15], Tatsuoka et al. [2S] and Salibian-Barrera et al. [23] for 
multivariate location and scatter. We show that our estimates have both a high 
breakdown point and a high normal efficiency. 

In Section 2 we define MM-estimates for the MLM and prove some properties. 
In Section 3 and 4 we study their breakdown point and Influence Function. In 
Section 5 and 6 we study the asymptotic properties (consistency and asymptotic 
normality) of the MM-estimates assuming random predictors and errors with an 
elliptical unimodal distribution. In Section 7 we describe a computing algorithm 
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based on an iterative weighted MLE. In Section 8 we present the results of a 
simulation study and a real example in Section 9. All the proofs can be found in 
the Appendix. 



2 Definition and properties 

Before defining our class of robust estimates for the MLM, we will define a robust 
estimate of scale. 

Definition 1. Given a sample of size n, \ = (fi, . . . an M-estimate of scale 
s(v) is defined as the value of s that is solution of 



where b G (0, 1), or s = if ]^{vi = 0) > n{l — b), where jj is the symbol for 
cardinality. 

In this paper we use b = 0.5, which ensures the maximal asymptotic break- 
down point (see [TT]). 

The function po should satisfy the following definition. 

Definition 2. A p-function will denote a function p{u) which is a continuous 
nondecreasing function of \u\ such that p(0) = 0, sup„p(u) = 1, and p{u) is 
increasing for nonnegative u such that p{u) < 1. 

Note that according to the terminology of Maronna et al. [TB] this would be 
a "bounded p-function". A popular p-function is the bisquare function: 




(2.1) 



i=l 
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where /(■) is the indicator function. 

Definition 3. Given a vector u and a positive definite matrix V, the Mahalanobis 
norm of u with respect to V is defined as 

rf(u,V) = (u'V-iu)i/2_ 

For particular given B G MP^'^ and S G M.'^^'^, we denote by (ij(B,S) (i = 
1, . . . ,n) the Mahalanobis norms of the residuals with respect to the matrix S, 
that is, 

rf,(B,S) = (u,(B)'S-lu,(B))l/^ 

with Ui(B) =yi- B'xj. 

Using the concepts defined before, we can describe an MM-estimate for the 
MLM by the following procedure: 

Let (B„, S„) be an initial estimate of (Bq, So); with high breakdown point and 
such that |S„| = 1, where is the determinant of Compute the Maha- 
lanobis norms of the residuals using (B„, Sn), 

d,{Bn, S„) = {^,{Bn)i:~'ui{Bn)Y^' 1 < I < U. (2.3) 

Then, compute the M-estimate of scale (T„ := s(d(B,„, S,„)) of the above norms, 
defined by (12. ip . using a function po as specified in Definition [2] and b = 0.5. 
Let pi be another p-function such that 

Pi < Po (2.4) 
and let §q be the set of all positive definite symmetric q x q matrices. 
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Let (B„, r„) be any local minimum of 

S(B.T)^±pJ'-^) (2.5) 

1=1 ^ ' 

in W'^'^ X Sq, which satisfies 

5(B„,f„) <5(B„,S„) (2.6) 

and |r„| = 1. Then the MM-estimate of Bq is defined as B„, and the respective 
estimate of Sq is 

= <T'f n. (2.7) 

In the MM-estimates for the univariate linear model the residuals are used 
as a tool of outlier detection, in the MM-estimates for the multivariate linear 
model the Mahalanobis norms of the residuals play the same role. To compute 
the M-escale it is necessary to have an initial estimate of Bq, to compute the 
residuals, and an initial estimate of the shape of Sq, So/|So|^/'^, to compute the 
Mahalanobis norms of the residuals. 

Remark 1. One form of choosing the p-functions po one? pi in such a way that 
they satisfy f l2.4p is the following. Let p he a p-function and let < cq < Ci . We 
take 

Pq = p{u/co) and p^ = p{u/ci). (2.8) 

The value cq should he chosen such that the asymptotic value of On is one when 
the errors Uj, with i = 1, . . . ,n, have distrihution Nq{0,I). The choice of Ci will 
determine the asymptotic efficiency of the MM-estimate. For more details see 
Remark 
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The following theorem implies that the absolute minimum of ^(B, r/iri^/"^) 
in MP^'^ X Sg exists. Clearly, from this absolute minimum we can obtain an MM- 
estimate. However, any other local minimum (B,r) which satisfies (12. 6p . may 
also be used to get an MM-estimate with high breakdown point and with high 
efficiency under Gaussian errors. 

Before stating the theorem we define /c„ as the maximum number of observa- 
tions (yi,x^) of a sample that are in a hyperplane, i.e., 

K := max #{z : v'x, + w'yi = 0}. (2.9) 

|v|| + ||w||>0 

Theorem 1. Let Z = {zi, . . . , z^} be a sample of size n satisfying the MLM (II. ip . 
where Zj = (y^, x^). Ifkn/n < 0.5 then there is a pair (B„, r„) that minimizes the 
function S{B,T), defined m for all (B,r) e R^^^ x Sg such that \T\ = 1. 

The proof of this theorem can be found in the Appendix. 

In the following theorem we obtain the estimating equations of MM-estimates. 

Theorem 2. Assume that pi is differentiable. Then the MM-estimates (B„, S„) 
satisfy the following equations: 

n 

W [dSn, S„)) G.(B„)x; = (2.10) 

2=1 

ELi W f^*(Bn, S„)) u,(B„)u,(B„)' 

Sn = g V ^ J. — — (2.11) 

Er=i^i (rfi(B„,S„)jc/,(B„,SO 
where ipi{u) = p'i{u) and W{u) = ■ipi{u)/u. 

Remark 2. As we can see in equation (I2.10p . the jth column o/B„ is the weighted 
LSE corresponding to the univariate regression whose dependent variable is the 
jth component of y , the vector of independent variables is the same that in the 
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multivariate regression and the observation i receives the weight W ^(ij(B„, S„,) j . 
Furthermore, by fl2.1ip . I]„ is proportional to the sample covariance matrix of 
the weighted residuals with the same weights. As these weights depend on the 
estimates B„ and S„, we cannot use the relations ( I2.10p and (12. lip to compute 
the estimates, but they will be used to formulate an iterative algorithm in Section 
6. 

Remark 3. //B„ is regression-, affine- and scale- equivariant and S„ is affine- 
equivariant and regression- and scale-invariant. Then B„ will be regression-, 
affine- and scale- equivariant and S„ will be regression- and scale- invariant and 
afjine- equivariant. 

3 Breakdown Point 

Now, to investigate the robustness of the MM-estimates, we will seek a lower 
bound of their finite sample breakdown point. The finite sample breakdown point 
of the coefficient matrix estimate is the smallest fraction of outliers that make 
the estimator unbounded, and the finite sample breakdown point of the covari- 
ance matrix estimate is the smallest fraction of outliers that make the estimate 
unbounded or singular. 

Let Z = {zi, . . . , z„} be a sample of size n that satisfies the MLM (11. ip . where 
= (Yi^x-) and let B and S be estimates of Bq and So respectively. We define 

Zm = {Z* = {z*, . . . , z* } such that ^{i : Zj = z*} >n — m}, 

5™(Z,B) = sup{||B(Z*)||2 with Z* G 2,^}, 
5+(Z,S) = sup{Ai(S(Z*)) with Z* e Zm} 
8 



y 

5-(Z,S) = inf{A,(S(Z*)) with Z* e 

where Ai(S(Z*)) and Xq{Ti{Z*)) are the largest and smallest eigenvalue of S(Z*) 
respectively. 

Definition 4. The finite sample breakdown point o/B is e*{Zi, B) = m* /n where 

m* = min{m : S'm(Z, B) = oo}, 

the finite sample breakdown point ofH is s*{7i, S) = m* /n where 

m* = minjm : — h Sl,(Z, S) = ooj 

5-(Z,S) 

and 4(Z,B,S) = min{e*(Z, B), e*(Z, S)}. 

The following theorem gives a lower bound for the breakdown point of MM- 
estimates. 

Theorem 3. Let Z = {zi,...,z„}, with Zj = (y^,x^) that satisfies the MLM 
U.l\} . and kn defined in (12.91) . Consider po (I'f^d pi two p-functions that satisfy 
(12. 4 p and suppose that kn < n/2. Then 

<(Z, B„, S„) > min (^<(Z, B„, S„), . (3.1) 

Since /c„ is always greater or equal than p + q — 1, if (Z, B„, S„) is close 
to 0.5 the maximum lower bound will be ([?^/2] — {p + q — i.e. when the 

points are in general position the finite sample breakdown point is close to 0.5 
for large n. 

If we didn't fix 6 = 0.5 and if kn < n{l — b), we would have the same bound 
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as in (13.11) but with [n(l — b)] in place of [n/2]. In this case, the maximum finite 
sample breakdown point would be attained in b = 0.5 — kn/n which is very close 
to our choice of 6 = 0.5 when kn/n is small. 

4 Influence function 

Consider an estimate On depending on a sample Z = {zi, . . . , z„} of i.i.d. variables 
in M'^ with distribution Hg, where G C M™". Let T be an estimating functional 
of such that T(if„) = On, where Hn is the corresponding empirical distribution. 
Suppose that T is Fisher consistent, i.e. T{He) = 6. The influence function of 
T, introduced by Hampel [1], measures the effect on the functional of a small 
fraction of point mass contamination. If 6z denotes the probability distribution 
that assigns mass 1 to x, then the influence function is deflned by 



In our case, z = (y',x')' satisfy the linear model (11. ip . 6 = (Bq, Sq) and 
Hg = Hq. Let To,i, To,2 be the functional estimates asociated to the inicial 
estimates B„ and S„, and Ti, T2 the functional estimates corresponding to 
the MM-estimates B„ and S„,. Then, according to (12.101) and (12.111) . given a 
distribution function H of (y', x')', the pair (Ti(i7), T2(if)) is the value of (B, S) 
satisfying 



z, T, e) = lim 



T((l - e)Hg + e6,) - T{Hg) _ dT{{l - e)Hg + e5,) 



e de 



EhW ((i(B, S)) u(B)x' = 0, 



EhW {d{B,T,))u{B)u{B)' 
EHo^iKB,S))rf(B,S) ' 



and 



S = S{HfT, with |r| = 1 
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where d(B, S 



) = d{u(B), S), u(B) = y - B'x and 



EhPo 



d(To,i(/7),To,2(g)) 
SiH) 



) 



0. 



Note that the M-estimate of scale, a^, used in the definition of MM-estimates 
(B„, S„), verify = S{Hn), where if„ is the empirical distribution of zi, . . . , z^. 

Next we will state the infiuence function of MM-estimators for the case where 
errors in fll.ll) have an elliptical distribution with unimodal density. For that, we 
need to make the following assumptions: 

(Al) pi is strictly increasing in [0, n] and constant in [k, +oo) for some constant 

K < OO. 

(A2) Pgo(B'x = 0) < 0.5 for all B G W"}. 

(A3) The distribution Fq of Uj has a density of the form 



where /g is nonincreasing and has at least one point of decrease in the interval 
where pi is strictly increasing. 

(A4) Go has second moments and £'g'o(xx') is no singular. 

Theorem 4. Let (yQ,XQ) be a random vector satisfying the MLM U.l\} with pa- 
rameters Bq and Sq. Assume that (S1)-(S4) hold and that the partial derivatives 
of EhqW {d(BQ,'SQ)/ S{Hq))u{'Bq):k' can be obtained differentiating with respect 
to each parameter inside the expectation, where Hq is the distribution of {y' , x.')' . 
Suppose that the functional estimates associated to the initial estimates B„ and 
S„ are affine- equivariant. Then, the infiuence function for the functional esti- 




(4.1) 
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mator Ti corresponding to the MM- estimate B„ is 



/F(zo,Ti,Bo,I]o) 

( ((yo-B^Xo)'So^(yo-B'oXo))^/^ ^^ w p ^ ^-i 

= cW So(yo - BoXo)xoEgo(xx ) , 

\ ^0 J 

where ao = S{Ho) and 



As in the case of MM-estimators for univariate linear regression, the influence 
function of the proposed MM-estimate is unbounded. 



5 Consistency 

We will now show the consistency of MM-estimates for multivariate regression for 
the case in which errors in (11. ip have an elliptical distribution with an unimodal 
density. For this, we need the following additional assumptions: 

Theorem 5. Let (y^,x^), 1 < i < n, be a random sample of the MLM U.l\) 
with parameters Bq and So- Assume that po one? pi are p-functions that satisfy 
the relation (12. 4p . that (A1)-(A3) hold and that the initial estimates B„ and S„ 
are consistent for Bq and Fq respectively, where Fq = SolSol"^^"^; then the MM- 
estimates B„ and S„ satisfy 

(a) lim„^ooB„ = Bq a.s.. 

(h) lim„_^oo = o"oSo a.s. with (Tq defined by 
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6 Asymptotic Normality 

Before obtaining the limit distribution of B„ we need to make some additional 
assumptions. 

(A5) pi is differentiate, ipi = Pi a-nd W{u) = ipi{u)/u is differentiable with 
bounded derivative. 

(A6) £^Gol|xf < oo, ^Gollxf < oo, ^Hollxf ||yf < oo and £:iyj|x||2||yf < oo, 
where Hq is the distribution of z = (y',x')'. 
(A7) Let = (B, S) and 

(/)(z; e) = W (rf(B, E)) vec((y - B'x)x'). (6.1) 

The function ^{0) = EHo<f){'z; 0) has a partial derivative d^/dvec(B'y which is 
continuous at = (Bo, o"oSo) and the matrix 

is non singular. 

Theorem 6. Let Zj = (y,-,x-), with 1 < i < n, be a random sample from the 
model U.l\) with parameters Bq and Sq. Assume that the p-function pi satisfies 
(Al), that (A3)-(A7) hold and that the estimates B„ and S„ are consistent for 
Bo and Tq = So|So|~"'^/^ respectively; then n^/'^vec{Q'^ — Bq) A'gp(0, V), where 
-4 denotes convergence in distribution and 

V = A-^MA-i' (6.3) 

where M is the covariance matrix 0(zi, (Bo,o"o^o)), with cf) defined in (16. ip and 
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A is defined in (16.21) . 

Assumptions (A4)-(A7) are sufficient to prove Theorem El but we conjecture 
that the hmit distribution of B„ can be proved under less restrictive hypotheses. 

Remark 4. Note that the rate of convergence of the MM-estimates depends only 
on the consistency hut not on the rate of convergence, of the initial estimates. 

Under suitable differentiability conditions we can obtain a more detailed ex- 
pression of the covariance matrix V of Theorem [61 

Proposition 7. IfWi{u) = W{^/u) is differentiable with bounded derivative and 
the initial estimates (B„, S„) are affine-equivariant, then 



V 



EfoW* - 

CTq 



(EgoXx')-^ ® (6.4) 



where 



and 



(6.5) 



V = (u'So^u) 

From the proof of Proposition [7| (see Appendix), it is easily seen that if Wi{u) 
is continuously differentiable with bounded derivative, assumption (A7) holds if 
and only if Ep.W* (^(u'Sq ^u)^^V^o) 7^ 0. 

Remark 5. The covariance matrix of the MLE, is given by 

y={EF,{v')/q) (EgoXx')-^ ® So. 



Then the asymptotic relative efficiency of the MM- estimate B^, with respect to the 
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MLE IS 



ARE{iPi,Fo) = Ep,{v^) 




As we mentioned in RemarkUl to obtain an MM-estimate which simultane- 
ously has high breakdown point and high efficiency under normal errors it suffices 
to choose Co and ci in (12. 8 p appropriately. The constant cq can be chosen so that 

where u is A'^g(0, Sq), So = |So|"^/^ro y b = 0.5, this ensures a high breakdown 
point and that the asymptotic relative efficiency (16. 6 p depends only ofci. Then, ci 
can be chosen so that the MM-estimate has the desired efficiency without affecting 
the breakpoint that depends only o/cq. 

Table [1] gives the values of cq verifying (16. 7p for different values of q. Table |2] 
gives the values of Ci needed to attain different levels of asymptotic efficiency. In 
both cases the function p from (12. 8p is equal to the bisquare function, ps, given 
in (Q. 



Q 


1 


2 


3 


4 


5 


10 


Co 


1.56 


2.66 


3.45 


4.10 


4.65 


6.77 



Table 1: Values of Cq for the bisquare function. 



7 Computing algorithm 

In this section we propose an iterative algorithm to compute B„ and I]„ based 
on the Remark [2J Let Zj = (yi,x-) be a sample of size n and assume we have 
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ARE 


Q 




1 


2 


3 


4 


5 


10 


0.80 


3.14 


3.51 


3.82 


4.10 


4.34 


5.39 


0.90 


3.88 


4.28 


4.62 


4.91 


5.18 


6.38 


0.95 


4.68 


5.12 


5.48 


5.76 


6.10 


7.67 



Table 2: Values of ci for the bisquare function to attain given values of the 
asymptotic relative efficiency {ARE) under normal errors. 

computed the initial estimates B„ and S„ with high breakdown point and such 
that = 1. 

1. Using the initial values B*^*^) = B„ and T^^'^ = S^, compute the M-estimate 
of scale an := s(d(BW, r(°))), defined by (^J^, using a function po as in 
the definition and b = 0.5 and the matrix S^°^ = a'^T^^\ 

2. Compute the weights Uio = W (^di(B^'^\ S'^'')) j for 1 < i < n. These weights 
are used to compute each column of B^^'' separately by weighted least 
squares. 

3. Compute the matrix 

n 

C« = X:c^.oQ.(B«)u:,(B«), 

i=l 

and with it the matrix S^^) = al&^y\C'^^^\^/'>. 

4. Suppose that we have already computed B^''""^) and S^'^"^^ Then B'^'^^ and 
^(fc) g^j-Q computed using the steps 2 and 3 but starting from B^^"^^ and 
S(^-~i) instead of B^o) and S^"). 

5. The procedure is stopped at step k if the relative absolute differences of 
all elements of the matrices B'^'^^ and B'^^'^^^ and the relative absolute dif- 
ferences of all the Mahalanobis norms of residuals Uj(B'^'^^) and Uj(B('^~^^) 
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with respect to S^''"^ and S*^'^ respectively are smaller than a given value 
S. 

The following theorem, whose proof can be found in the Appendix, shows 
that the iterative procedure to compute MM-estimates yields the descent of the 
objective function. 

Theorem 8. IfW{u) is nonincreasing in \u\ then at each iteration of the algo- 
rithm the function XliLi Pi {di{Q, S)) is nonincreasing. 

8 Simulation 

8.1 Simulation design 

To investigate the performance of the proposed estimates we performed a simu- 
lation study. 

- We consider the MLM given by (11. ip for two cases: p = 2, q = 2 and p = 2, 
q = 5. Due to the equivariance of the estimators we take, without loss of gen- 
erality. Bo = and Sq = Ig. The errors Uj are generated from an Nq{0,I) 
distribution and the predictors Xj from an A'p(0,I) distribution. 

- The sample size is 100 and the number of replications is 1000. We consider 
uncontaminated samples and samples that contain 10% of identical outliers of 
the form (xq, yo) with xq = (xq, 0, . . . , 0) and yo = {rnxo, 0, . . . , 0). The values 
of Xq considered are 1 (low leverage outliers) and 10 (high leverage outliers). We 
take a grid of values of m, starting at 0. The grid was chosen in order that all 
robust estimates attain the maximum values of their error measure. 

- Let B'^'^^ be the estimate of Bq obtained in the kth replication. Then, since we 
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are taking Bo = 0, the estimate of the mean squared error (MSE) is given by 



It must be recalled that the distributions of robust estimates under contami- 
nation are themselves heavy-tailed, and it is therefore prudent to evaluate their 
performance through robust measures (see [11] Sec. 1.4, p. 12, and [10] p. 75). For 
this reason, we employed both MSE, and trimmed mean squared error (TMSE), 
which compute the 10% (upper) trimmed average of 



The results given below correspond to this MSE, although the TMSE yields qual- 
itatively similar results (in the uncontaminated case the results are the same). 

8.2 Description of the estimators 

For each case, four estimates are computed: the MLE, an S-estimate, a r-estimate 
and an MM-estimate. 

For the MLM, the S-estimates are defined by 





(B, S) = argmin{|S| : (B, S) G x SJ 



subject to 



s2(c?i(B,I]),...,rf„(B,I])) =g. 



where s is an M-estimate of scale. 
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Garcia Ben et al. [8] extended r-estimates to the MLM by defining 



(B, S) = argmin{|I]| : (B, S) E x SJ 



subject to 



r2(di(B,S),...,d„(B,S)) = «:, 



(8.1) 



where the r-scale is defined by 



n 



r 



'M = {s'{^r)/n)J2p2{H/sM), 



(8.2) 



i=l 



where v = (f i, . . . , w„), p is a p-function and s is an M-estimate of scale. 

The robust estimates are based on bisquare p-functions. The M-estimate of 
scale used in the S-estimate is defined by po(^) = Pb{u/cq), and b = 0.5 so that 
the S-estimate has breakdown point 0.5 (see Table [1]). The r-estimate uses the 
same po and b as the S-estimate to compute the M-scale and P2{u) = Pb(m/c2), 
where C2 is chosen together with the constant k, from equation (18. ip . so that the 
r-estimate has an ARE equal to 0.90 when the errors are Gaussian (see Table 2 in 
[S] in which k, = 61^2 /c^)- The initial estimate needed to compute the r-estimate 
is computed using 2000 subsamples. The MM-estimate uses the same po as the 
S-estimate to compute the M-estimate of scale and pi(u) = pb{u/ci), where ci is 
chosen so that the MM-estimate has an ARE equal to 0.90 when the errors are 
Gaussian (see Table [2]). We use the S-estimates as (B„, S„) and the value of 6 in 
step 5 of the computing algorithm is taken equal to 10~^. 
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Estimate 


q=2 q=5 

MSE SE REFF ARE MSE SE REEF ARE ( 


MLE 

S-estimate 
T-estimate 
MM-estimate 


0.041 0.001 1.00 1.00 0.103 0.002 1.00 1.00 
0.074 0.002 0.55 0.58 0.125 0.002 0.83 0.85 
0.046 0.001 0.89 0.90 0.116 0.002 0.90 0.90 
0.046 0.001 0.89 0.90 0.116 0.002 0.90 0.90 



Table 3: Simulation: mean squared error (MSE), standard error of the MSE 
(SE), relative efficiency (REFF) and asymptotic relative efficiency (ARE) of the 
estimates in the uncontaminated case for n = 100 and p = 2. 
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Figure 1: Simulation: mean squared errors for g = 2 and Xq = 1- 

8.3 Results 

Table |3] displays the mean squared errors, the standard errors and the relative 
efficiencies and asymptotic relative efficiencies with respect to the MLE for the 
uncontaminated case. It is seen that the relative efficiencies of all robust estimates 
(computed as the ratio of their respective MSEs and the MSE of the MLE) are 
close to their asymptotic values. The r- and MM- estimates have similar high 
efficiencies, and both outperform the S-estimator. 

In Figures [H El |3] and H] we show the MSEs of the different estimates under 
contamination. 
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Figure 2: Simulation: mean squared errors for q = 2 and Xq = 10. 
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Figure 3: Simulation: mean squared errors for g = 5 and Xq = 1. 

In Figure [H which corresponds to g = 2 and a;o = 1, we observe that the MM- 
and r-estimates behave similarly, both having a smaller MSE than the S-estimate 
except when m is (approximately) between 2.8 and 4. In this case, the S-estimate 
has the largest maximum MSE among the robust estimates. As expected, the 
MSE of the MLE increases without bound for large m. Figure [2] shows the results 
for q = 2 and xq = 10. S-, r- and MM-estimates behave similarly. In Figure [3l 
which corresponds to g = 5 and xq = 1, the three robust estimates are seen to 
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Figure 4: Simulation: mean squared errors for g = 5 and Xq = 10. 

follow essentially the same pattern. For m < 4.8 (approximately) the r- and 
MM- estimates have similar behaviors, both outperforming the S-estimate. For 
m > 4.8 the S- and MM-estimates have similar behaviors, both outperforming 
the r-estimate. For q = 5 and Xq = 10 (figure Hj) the behavior of the robust 
estimates is similar to the one observed for g = 2 and Xq = 10 (figure |2]). 

9 An example with real data 

In this Section we analyze a dataset corresponding to electron-probe X ray micro- 
analysis of archeological glass vessels (Janssens et al., [IZj). For each of n = 180 
vessels we have a spectrum on 1920 frequencies and the contents of 13 chemical 
compounds; the purpose is to predict the contents on the basis of the spectra. 
In order to limit the size of our data set, we considered only two compounds 
(responses): P2O5 and PbO; and chose 12 equispaced frequencies between 100 
to 400. This interval was chosen because the values of almost null for 

frequencies below 100 and above 400. We have therefore p = 13 and q = 2. 

We considered two multivariate regression estimates: the MLE and our MM- 
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Sq. root of quantiles Sq. root of quantiles Sq . root of quantiles 

q q ^ «'q ^ 

Figure 5: QQ-plots of the Mahalanobis norms of the residuals of the MM-estimate 
(left), the MLE (right) and the MM-estimate in the same interval as the MLE 
(center). 



MLE 


MM-estimate 




( 0.0645 -0.0008 ^ 
^ -0.0008 0.0348 j 




( 0.0102 -0.0014 ^ 
^ -0.0014 0.0084 J 





Table 4: MLE and MM-estimate of the covariance matrix of the errors. 

estimate. As initial estimate for the MM-estimate we use an S-estimate. The S- 
and the MM-estimates employ bisquare p-functions with constants such that the 
MM-estimate has Gaussian ARE equal to 0.95 and the S-estimate has breakdown 
point 0.5. In Figure E] we present QQ-plots of the Mahalanobis norms of the 
residuals of the MLE and the MM-estimate against the root quantiles of the chi- 
squared distribution with q degrees of freedom. The QQ-plot of the MM-estimate 
shows clear outliers. 

In Figure [6] are compared the sorted absolute values of the residuals of the 
MLE with those corresponding to the MM-estimator for each component of the 
response. 

The right and left panels of Figure [5] show respectively the QQ-plots of the 
Mahalanobis norms of the residuals of the MLE and the MM-estimate against the 
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Figure 6: QQ-plots of sorted absolute residuals of MM-estimates vs sorted abso- 
lute residuals of MLE for each component of the response. Left plot corresponds 
to P2O5 (first component) and right to PbO (second component). 



Criterion 


MLE 


r-estimate 


MM-estimate 


MM- univariate 


Component 


1 2 


1 2 


1 2 


1 2 


MSE 


0.081 0.051 


0.351 0.806 


0.340 0.682 


0.354 0.762 


r-scale 


0.044 0.022 


0.005 0.007 


0.008 0.006 


0.005 0.006 



Table 5: Mean square error (MSE) and r-scale of the prediction errors of the MLE, 
multivariate MM-estimate, r-estimate and univariate MM-estimate for each com- 
ponent of the response, computed by cross-validation. 

square root quantiles of the chi-squared distribution with q degrees of freedom. 
For ease of comparison, the center panel shows the MM's QQ-plot truncated to 
the size of the MLE's QQ-plot. The latter shows a very good fit of the norms 
to the chi-squared distribution, and therefore points out no suspect points, while 
the MM-estimate's QQ-plot indicates some 30 possible outliers, i.e. about 16% 
of the data. 

The MLE's norms are in general smaller than the MM-estimate's norms, but 
this does not mean that the former gives a better fit, since here the residuals are 
normalized by the respective estimated residual dispersion matrices Sq. Figure 
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Figure 7: Absolute values of the coordinates of the bidimensional residual vectors 
corresponding to the MLE (left) and to the MM-estimate (right). 

E] compares the sorted absolute values of the (univariate) residuals of the MLE 
with those of the MM-estimate for each response. We can see that the majority 
of the residuals corresponding to the MM-estimate are smaller than those of the 
MLE. 

To understand why the MLE's norms are in general smaller than the MM- 
estimate's norms, while the respective residuals are in general smaller, we show 
in Table H] the estimates given by the MLE and MM-estimate of the dispersion 
matrix of the errors. It is seen that the former is "much larger" than the latter, 
in that its two diagonal elements are respectively six and four times those of the 
latter. 

To complete the description of the estimates' fit. Figure [7] shows the absolute 
values of the coordinates of the bidimensional residual vectors, in the right panel 
(corresponding to residuals of the MM-estimate) is truncated to the size of the 
left panel (corresponding to residuals of the MLE), and consequently a 10% of 
the absolute residuals of the MM-estimate is not shown. It is seen that, while 
the residuals from MM have a larger range than those from the MLE, they are in 
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general more concentrated near the origin. In general, we may conclude that the 
MM-estimate gives a good fit to the bulk of the data, at the expense of misfitting 
a reduced proportion of atypical points, while the MLE tries to fit all data points, 
including the atypical ones, with the cost of a poor fit to the bulk of the data. 

We compared the predictive behaviors of the MLE and the MM-estimates 
through five-fold cross validation. We also included the univariate MM-estimates 
corresponding to each component of the response and the r-estimate proposed 
by Garca Ben et al. [8]. As initial estimate for the univariate MM-estimates we 
use S-estimates. The r-, S- and the univariate MM-estimates employ bisquare 
p-functions with constants such that the univariate MM- and r-estimates have 
Gaussian ARE equal to 0.95 and the S-estimate has breakdown point 0.5. We 
considered two evaluation criteria: the mean squared error (MSE) and a robust 
criterion, namely a r-scale (18.21) of the predictive errors, both computed separately 
for each component of the response. In the r-scale s is an M-scale with breakdown 
point 0.5 and p2 is a bisquare p-function with constant such that the r-scale has 
Gaussian asymptotic efficiency equal to 0.85. 

Table shows the results. According to the MSE, the MLE is much better 
than the robust estimates. However, the r-scales yield the opposite conclusion. 
The reason of this fact is the MSE's sensitivity to outliers. This result shows how 
misleading a non-robust criterion may be. According to the r-scale, the predictive 
performance of our MM-estimate for the second component is slightly better than 
that of the r-estimate, while the opposite occurs for the first component. The 
results obtained with the univariate MM-estimates are similar to those of the 
multivariate MM. 

The QQ-plots in Figure [S] compare for each response component the absolute 
values of the sorted cross validation prediction errors of our MM-estimate with 
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Figure 8: QQ-plots of sorted absolute prediction errors of MM-estimates vs sorted 
absolute prediction errors of MLE for each component of the response, computed 
by cross-validation. Left plot corresponds to P2O5 (first component) and right to 
PbO (second component). 

those of the MLE. For reasons of scale, in each QQ-plot the observations with the 
12 largest absolute prediction errors were omitted. We can see that most points 
lie below the identity line representing the identity function, showing that the 
MM-estimate provides a better prediction for the bulk of the data. 

10 Conclusions 

In this paper we have presented MM-estimates for the multivariate linear model 
and showed that they maintain the same good theoretical properties as in the 
univariate case, such as a high breakdown point and high Gaussian efficiency. 
The simulation study indicates that it has the desired high efficiency, and that 
its behavior is in general similar, and in several situations superior, to that of the 
r-estimate; it is also more efficient, and in most situations more robust, than the 
S-estimate. In the example with real data, our MM-estimate gives a good fit to 
the bulk of the data, pointing out the existence of atypical points, and shows a 
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good predictive behavior. 

A Appendix 

Before showing some of the properties of the MM-estimate, we set the notation 
for norms of vectors and matrices that we will use later: 

Given v G M"*, we denote its 2-norm or Euclidean norm as: 




where Vi represents the ith. element of v. 

Given a matrix A G M''^™, its spectral norm || ■ H^p, is defined as follows: 



|A|| = IIAll^p = max{||Av|| : v G with ||v|| = 1} 

= max ( : V G M"" with v ^ 1 . (A.l) 



Its 2-norm or Frohenius norm || ■ ||2 is its Euclidean norm if we think the 
matrix A as a vector of M"^''', i.e. 

i=i j=i J 

where aij represents to the (z, j)th element of the matrix A and tr(-) denotes the 
trace. 

Given V G W^^™- we denote its eigenvalues as 



Ai(V)>...> A^(V), 
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then if V is positive definite 



|V|| = max{Xj(V)} = Ai(V). (A.3) 
j 



Remark 6. Recall also that for any two norms \ \ ■ \ \a and \ \ ■ \ \b, we have that 

«||AL<||A||,</3||At 

for some a and f3 and for all matrices A G W^"^. In other words, they are 
equivalent norms, i.e. they induce the same topology in R*"^™". For || ■ ||a = || ■ || 
and II • life = II • II2 'we have a = 1 and /3 = ^/k where k is the rank of the matrix 
A. 

Recall also that spectral norm and the Frobenius norm are matrix norms, i.e., 
for any pair of matrices in M'"^^™- A and B 

IIABII2 < IIAII2IIBII2 and ||AB|| < ||A||||B||. (A.4) 

This property will be used in several times. 

Before proving Theorem [1] we will prove the following Lemma: 
Lemma 9. Let y G and x G be fixed vectors. The function 



rf2(B,S) = (y-B'xyS-i(y-B'x 



is continuous in M.^^*^ x Sg. 



Proof: We only give the main ideas of the proof. Without loss of generality, due 



29 



to Remark [HI we can consider in M^^^ x M'^^^ the topology induced by the norm 

|||(A,V)||| = sup{||A|h,||V||}. (A.5) 

Given (Bo, So) in M^^'' x M'^^^, the proof consists in find an upper bound of 
|d2(B,S) -d2(Bo,So)| that tends to when |||(B, S) - (Bq, So)||| ^0. 
Adding and subtracting cP(Bo, S) we have that 

\d\B, S) - rf2(Bo, So) I < \d\B, S) - d\Bo, S) | + S) - ^^(Bo, Sq) | . 

Let r(Bo) = (y — Bgx). Using basic tools from linear algebra we obtain 

|ci2(B,S) -ci2(Bo,S)| < g|||(B,S) - (Bo,So)||| (||x|| + 2 ||r(Bo) ||) ||x|| Ai(S-i), 

and by Weyl's Perturbation Theorem (see [1], pg. 63), we have that 

^'^^ = A;(S) < A,(So)-|||(B,S)-(Bo,So)r ^^'^^ 

combining these inequalities we obtain a bound of |(i^(B,S) — (i^(Bo,S)| that 
tends to when ||| (B, S) - (Bq, Sq) ||| ^ 0. 

If r(Bo) = the lemma is proved, otherwise using the Cauchy-Schwarz in- 
equality and (1A.6I) we have 



^ ,1 ^ i,S„-'lll|r(B„)ir^|||(B,E)-(B„,E„) 



which completes the proof. ■ 
Proof of Theorem [It By Lemma [9], it sufices to show that there exist ti and t2 
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such that 

JL /^YRFU JL fdSn,r^)\ ^^^^ 



where 



e = {(B, r) G W""i X 2>q with |r| = 1 : \{T) < ti or ||B||2 > is}- (A.8) 

By definition of kn we have that for all 6 G W^'^ 

^{i : |0'z,| > 0}/n > 1 - {K/n). 

Taking 0.5 < 5 < 1 — (A;„/n) and using a compactness argument we can find e > 
such that 

inf m ■ 161 zJ > e}ln > 6. (A.9) 

Let (B,r) G W^'^ X Sg be such that |r| = 1, A be the diagonal matrix of 
eigenvalues of F order from lowest to highest and U be the orthonormal matrix 
of eigenvectors of T which verifies F = UAU'. Then 

rf2(B,r) = (y,-B'x,)'r"i(y,-B'x,) 

= (U'y.-U'B'x,)'A-i(U'y.-U'B'x,)>^^, (A.IO) 

with e = {—\i,Vi) where Vi and Vi are the first row of U' and V = U'B', 
respectively. 

Since ||e|| > 1, by (lA.Qp we have at least nS values of (ii(B, F) greater or equal 



than e/^yXq{T). Hence 
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f t \ I e"^ 
Let t be such that pi I — = — ^, with 0.5 < 5i < 5, and let ti = — , then if 

\(JnJ 2di V 

\(r) < ti we obtain the inequahty 



If Ag(r) > ti and ||B||2 > ^2? all eigenvalues of T are smaller than and 
at least one column of B has a norm greater o equal than t2 / ^/q■ By ( 1A.2|) , we 
have 



|V||^ = ||U'B'||2 = tr(BUU'B') = tr(BB') = ||B'||^ = ||B| 



2 
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and therefore exists a k such that ||vfe|| > ^2/ where is the kth row of V. 
Then proceeding as in flA.lOp we obtain 



where efc = (— v^, Vk) and Vk is the kth row of U'. 



By flA.Qp . at least n6 values of di(B, T) are greater or equal than e||e;^.||/ ytl 
and ||efc|p = 1 + ||vfe|p > 

t 



Then if we take t2 = -\l q/t\ ^ we obtain 



n 



Then by flAJTl) and (1X12]) 



inf VpW^^V- 
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and by ([23D and 



E/c/j(B„,r„)\ >A ( di{Bn,Tn)\ n 



and this proves the Theorem. ■ 

Before proving Theorem [2] we will give some results on matrix derivatives that 
will be used later. 

Let b be a vector and V be a symmetric matrix, 



dh'Yh 

= 2b'V, (A.13) 



if V is nonsingular, 



and 



dh' 



d\V\ 



|V|V~^ (A.14) 



^ ^ -V-bb'V-. (A,15) 

For further details see Chapter 17 of [21]. 

Let (B, S) G W""i X Sg, using vec(B'A) = (A (g)Ig)vec(B'), it is easy to check 
that 

(9vec(B'A) , , , , , , , , 

li;5^ = (A'si,). (A.16) 

From f lATT6|) and f lATTSj) it follows that 

.^ec(BO' d(B,S) ■ ^ ' ' 

Proof of Theorem The definition of MM-estimates can be reformulated, us- 
ing the function r(S) := S/|S|^/'^, in the following way: let (B^, C„) be any 
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local minimum of 5*(B, S) = S{B, r(S)) in M^^^ x S,, which satisfies 



'S'*(B„,C,i) < S'*(B„, 



Finally, the MM-estimate is defined as 



(A.18) 



Differentiating S'*(B, S) with respect to B and E we get 



X:^^'<'''<Y'^»^'"'(B,.,a.) = o 



1=1 



dB 



(A.19) 



and 



X:^^i<*<5iI<51)ZM,B„,c„) = o. 



i=l 



By (IA.16p . we have that 



9S 



(A.20) 



gpi (ci,(B,r(S))/a„) 
a(vec(B'))' 



1 ^ ci.(B,r(S)) ^ /u^(B)T(Sri- 



rf,(B,r(S))\ u,(B)T(S)-i(x^®I 



Then, by fUlSl) 



gpi (ci,(B,r(s))K) 

a(vec(B'))' 



B„,C„) = -ly rf,(B„,S„) u,(B„)'S;^(x^0l,). (A.21) 



Using that vec(S-^Ui(B„)x^)' = Ui(B„)'S-^(x^ ® I,), f ETj) and f[09|) . we can 
see that (12.101) is true. 
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Differentiating pi {di(B,r)/an) with respect to S, we get 

9pi (rf.(B, r(s))K) _ 1 / d..(B,r(s)) \ g(rf.(B,s)|s|V(^'^)) ^ ^2) 



From f lA.14p and f lA.lSp we have 



2g H , ; 2rfi(B,S) ' ' 

^"'^d.(B.r(E))-, "-'?!^W;^'^)'' V (A.23) 



2« V <i.(B,r(S)) 
Then, by flA.22p and flA.23p . the equation flA.20p results equivalent to 



V ^" A ^ (i4B„,r(C„))a„ ' 



Rearranging and using that W{u) = ip{u)/u and that S„ = o-^r(C„) we have 
that 

(n \ n 

^V^i(d,(B„,S„))rf,(B„,S„) = qY,w(dSn,^n)) u,(B„)ui(B„)' 
i=l J i=l 

and solving for S„ we get fl2.1ip . ■ 

Before showing Theorem [3] we will prove the following lemma: 

Lemma 10. Let Z = {zi, . . . ,z„}, with Zj = (y^,x-) that satisfy U.l\} and con- 
sider a p-funtion Pq. Then the explosion breakdown point of the M- estimate of 
scale of the Mahalanobis norms fl2.3p . an '■= s(d(B„,, S„)), is bounded below by 

min«(Z,B„,S„),0.5). 
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Proof: Let 

m < min(ne* (Z, B„, S„), n/2) 

and let 

(b;,s:) = (b:(z*),s:(z*)) 

be an initial estimate of (Bo, So) computed with the sample Z* G Z,m. To prove 
the Lemma it suffices to show that s(d(B* , S*)) is bounded for all Z* e Zm- 
Since m < nel^{Z, B„, there is a compact set K such that 

(B;, S;) e K for all Z* e Z^- 

Then, by Lemma [HI there is a t such that 

sup di{B*„, S;) < t for all Z* G (A.24) 

{i: Zi=z*} 

Since m/n < 0.5 we can find a 7 > such that m/n + 7 < 0.5. Let 6 be the 
value that verifies po(^) = 7 and let = t/6. Then using (1A.24P we have that 



n. 




{l-.Ziytz*} \ 



in — m) , , , m m 
< ^ ^Po tAo + -<Po 5 + - 

m 

= 7 H < 0.5. 

n 



thus s(d(B*, S*)) < to for all Z* G Z.^ and the lemma is proved. ■ 
Proof of Theorem [3t Let e*(Z,B„, S^) be the breakdown point of the initial 
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estimate (B„, S„) and 



m < min(ne* (Z, B, 



E0,[n/2]-/c„). 



Let 



(b:,s:) = (b:(z*) 



s:(z*)) and (b:,s: 



) = (b:(z*) 



s:(z*)) 



be respectively an MM-estimate for the MLM and its initial estimate computed 
with the sample Z* G Z^- 



Then there exists c < oo, that does not depend on Z*, such that, for at least 
[n/2] observations of Z*, rf- (B^, S;) < c. 

Now, since m < [n/2] — kn, at least /c„ + 1 of these observations are in Z, 
and not in a hyperplane. Then the smallest eigenvalue of S*, Ag(S*), is bounded 
below with a positive bound (for every Xj G W, the axis of the ellipsoid 



Then by ([22]), ([221) and 




Moreover, since suppi(M) = 1, we get 



5^ pi (d,(B:,s:)) <^pi(oo) 



{y : (y - B^'x, 



Sr^(y-B:'x,)<c} 



37 



have lengths y cXjCS"^); j = l,...,q. Then Ag(S*) > a, where a is a positive 
value not depending on Z*). 

Moreover, since |S*| = (a*)^'' = s(d(B* , S*))^'', by Lemma [10] the largest 
eigenvalue of S* is bounded above. 

To see that ||B*|| is bounded consider the set 

e(B:, s:) = {(v, w) : (w - B:'v)'sr^(w - b:» < c} 

that, as we saw, contains kn + 1 observations of Z that are not lying on a hyper- 
plane. 

Since for symmetric matrices A of dimension q x q, we have that 

v'Av 

A,(A) = inf 

V v'V 

and Ag(A^^) = l/Ai(A) it follows that 

iiw - B-vf < (w - b:'v)e:-i(w - b:'v)'a,(s:) < Ai(s:)c, 



in particular for w = 



iib:'v|p<Ai(s:)c. 

Since C(B* , S* ) contains /c„ + 1 points, there exists a constant g not depending 
on B* or S* such that ||v|| < g implies (v, 0) G C(B*, S*). Then we have that 



sup 



IB! 



that implies 
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iiB:'f <Ai(s:)c^2 

where || ■ || is the spectral norm defined in f lA.ip . Then, since || ■ II2 and || ■ || are 
equivalents, there exists a constant /3 > such that 

\\B:h = iiBrih < m:'\\ < i/3/g)^/x^c 

for all Z* G Zm- This proves the Theorem. ■ 



Before proving Theorem H] we need to prove several auxiliary Lemmas. 

Lemma 11. Assume we observe z eM} with distribution Hg^ e^, where 61 G M™^ 
and 62 G M."^^ . Consider a functional M- estimate that is Fisher consistent for 
e = (61,62), T{H) = {Ti{H),T2{H)) and an initial estimate ofT{H), To{H) = 
(To,i(i?),To,2(i^)), such that 

EH{h{z, T^{H), T2{H), S{To{H), H))) = 0, 

where h : M^+^i+^^a+i ^ M™i a differentiable function and S : x Q ^ 

M, where Q is the space of distributions on M."^i+"^'^ , Suppose that T satisfy the 
following strong Fisher consistency condition: 

Eue^^e, ih{^: 0u 02, S)) = for all S, (A.25) 

and 

Eue^^e, (^3(z, 01, 02, S)) = Q for all S, (A.26) 

where hi, 1 < i < A, is the derivative of h with respect to the ith argument. As- 
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sume that the partial derivatives of EHg_^ g^{h-i{z, Oi, O2, S{0, Hq^ q,^))) can be ob- 
tained differentiating with respect to each parameter inside the expectation. Then 
the influence function of Ti is given by 

x{h{zo,e,,02,S{0,He,,e,)))- 
Proof: Let = {1 — e)H0^^e2 + ^^zo- Then T{Hs) satisfy 
(1 - e)EHe^,,^ {h{z, T^{H,), T^m, ^(To(i/.), H,))) 

+h{zo, T,{H,), T2{H,), S{To{H,), H,)) = 0. 

The proof of the Lemma follows immediately differentiating the above expression 
with respect to e in e = and using flA.25p and flA.26p . ■ 



The following proves for the case XIq = I that the functional MM-estimates 
Ti and T2/S'^ are Fisher consistent for Bq and I, respectively. 

Lemma 12. Let z = (y',x')' be a random vector that satisfy the MLM (1.1) 
with parameters Bq and Sq = I., where x satisfies (A2) and the distribution of 
u = y — Bqx satisfies (A3). Let pi be a p-function that satisfies (Al) and S 7^ I 
such that IS I = 1. Then 



Eh,Pi , ((y-^'^y^~'(y-^'^)y^" ) > E^^p^ A(y - B'ox)'(y - B'ox))V^ 



O"0 / \ CTO 



This lemma follows immediately from Lemma A. 10 of [8]. 
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Lemma 13. Consider the same assumptions of Theorem [^J and suppose that 
So = I. Then, if Hq is the distribution of (y',x')', we have that 

(z) E^, ^ ^—^^ j=Ofor all S, 

(zz) Eh, {W{d{Bo, So)/5)t;ec(u(Bo)x')) = for all S. 
Proof: (i) By f[05|) we have 



V ^ec(S)/ J ^\ 2S\\u 

Since the distribution of u is assumed elhptical with Sq = I, for any function h 
we have, EHo{h{\\u\\)uiUjUiXk) = 0. Then, since all the elements of the right side 
of the above equation have this form, part (i) of the lema is proved, (ii) follows 
from EHo{h{\\u\\)uiXj) = for all i and j. ■ 

Proof Theorem |4t Assume z = (y',x')' satisfying the MLM (11.11) . Consider 
first the case with So = I. Using Lemma [TT] with 6i = vec(BQ), 02 = vec(So), 
S{To{H),H) = S{H) and lemmas [I2] and [I3] we obtained 



/F(zo,Ti,(Bo,I)) 



dEH.W (rf(Bo, So)/(To) vec((y - B[,x)x') \ ~' 



dvec{B'y 

xW (||yo - B[)Xo||/ao) vec((yo - B'.^oH)- (A.27) 
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By flA.16p and flA.17p and the equality vec(ux') = (x (g) Iq)u we have that 



dEn.W (rf(Bo, So)/ao) vec((y - B^x)x') 



9vec(B')' 
W KBo,ao%)) 



vec((y-B[,x)x')(y-B[,x)'(x'®I, 



W{d{Bo,allg)) (xx'®I,) 

\ Co J ao||u|| V c^o 



(Xx'®Ig). 



Since the distribution of u is assumed elhptical with Sq = I? for £^ny function 
h, Ep,{h{\\u\\)u,u,) = Oiit^j and EF,,{h{M>f) = EfMHDH?)/^- Then 

(rf(Bo, So)/(To) vec((y - B^x)x') 



9vec(B')' 



-E,,w'im Hull _ E,,w I M , s I) 



O"0 / O-Q 

EF,W'{\\u\\/ao) ||u| 



(EgoXx'®I). 



(A.28) 



Combining f lA.27p with (1A.28P and using then matrix equahty vec(C'A) = (A (8) 
I)vec(C'), we obtain the proof of the Theorem in the case So = I- 

For the general case, let R be a matrix such that Sq = RR' and consider the 
following transformation y* = R~^y. Then y* = Bq'x + u*, with u* = R^^u 
y Bq = BqR'^^. Since the distribution of u* is given by the density (14.11) with 
So = I and 

vec(B[,) = vec(RB*'), 
by the affine-equivariance of the estimate, we have that 



/F(zo, Ti, Bo, So) = R/F((R- Vo, xq), Ti, B*, I). 
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Before proving Theorem [5] we need to prove several auxiliary Lemmas. For 
simplicity we will assume that the initial estimator B„ is regression- and affine- 
equivariant and S„ is affine-equivariant and regression- invariant. Then without 
loss of generality we can assume, due to Remark [3l that Bq = and Sq = I. 
These assumptions are not essential for the proofs. 

Lemma 14. Let (y^,x-), I <i <n, be a random sample of the model U.l\) with 
parameters Bq and Sq; where the Xj are random and Sq = |So|"^'^^ro, and let po 
be a p-function. Assume that the initial estimates B^i and S„ are consistent for 
Bq and Tq respectively; then an is consistent to ctq defined by the equation flS.ip . 

Proof: Take s > 0, then by Lemma IHl we can find 6 > such that 

Eh, (inf Po (((y - B'x)'I]-^(y - B'x))'/' /{a^ -s)))>b + S 

and 

(inf Po (((y - B'x)'S-i(y - B'x))'/' /{a, + e))) < b - 5 

where £ = {(B, S) G x : ||B|| < 6, ||S - IJ| < 6}. By the law of large 

numbers we have 

lim — > mf Po -, r ] > b + d a.s. 

n — >oo n £ \ ((To — / 

and 

lim — inf Po I ^^-^ — — rl <h — 5 a.s.. 

1=1 ^ ^ ' ' 
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Then, since lim„ ^oo(B„, S„) = (0,lq) a.s., we have 

1 ^ / di(B„,S„) \ . , , 

hm - > po —, ^ > + a.s. 

n^oo n y ((To - e) y 

and 

hm — > po —, ^ < — d a.s.. 

Therefore by the monotonicity of po? with probabihty 1 there exists no such that 
for all n > no we have cto — £ < < do + e, i.e. lim„ s>oo o"„ = do a.s.. ■ 

The following lemma ensures the existence of a constant independent of B 
and S such that the ratio between the probability of the ellipsoid {(y',x') : 
(y — B'x)'S(y — B'x) < k\ and this constant is bounded by the root of each 
eigenvalue of S. 

Lemma 15. Suppose that the distribution of y satisfies (A3) with So = Iq and 
that y is independent of ^. Given (B, S) G MP^'^ x Sg and k > 0, consider 

a(B, S; k) = EhJ ((y - B'x)'S-i(y - B'x) < k) , (A.29) 

where Hq is the distribution of (y',x'). Then there exists a constant Hi indepen- 
dent of B and S such that 

a(B, S; k) < KiXjCSy/^ for all j, 1 < j < q. 

Proof: Note that V'SV = A where V is an orthogonal matrix of q x q and A 
is a diagonal matrix whose nonzero elements are the eigenvalues of S. Using the 
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change of variables y — > V'y and (A3) we obtain that for each j = 1, . . . ,q 
E{I{iy- B'x)'S-i(y - B'x) < |x = f3) 

/o(y'y)rfy 

(y-(BV)'/3)'A-i(y-(BV)'/3)<K 



< 



y,-((BV)'/3),|<^Aj{S)« \ -^j 
9-1 



< 2^/Aj(S)k 



Then if we choose 



since Ki does not depend on (3 we obtain the desired inequahty. ■ 

Lemma 16. Under the assumptions of Theorem\^ there exist positive constants 
e, Li ans L2 such that 

hm^^oo ||B„||2 < L2 a.s. (A. 30) 

and 

^ < lim„^oo l|fn|| < hm„^oo ||f„|| < Li a.s. (A. 31) 

wtth f , = ISJ-V-JS^. 

Proof: Let P be the measure on M'^ x W whose density is the product of /o(u) 
given in (14.1 p and the density of x,, (70 (x). According to Theorem 4.2 of Ranga 
Rao [19] we have 



hm sup \PniC) - P{C)\ = a.s. (A.32) 

n — i^oo ccRP+1, C convex 
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where P„ is the empirical measure induced by the sample. 
By Lemma [H] there exist uq and ai such that 



ai > a„ (A.33) 



for all n > uq. If we consider the set 



where k is the constant that appears in (Al), by flA.32p we can conclude that for 
large enough n 

P(£„) > P„(£„) - 6/2 

almost surely. 

By ([23D, (0O3|) and ([22]) we have 

n 



\ CTi n ^ \ ar. 



then by (Al), 



Pn(£n) = ^tt{(yMX.) : (y. - B;x,)T-^(y. - K^^)/al <K}>b 



and therefore P{8,n) > b/2 almost surely for n large enough. 

By Lemma[l5]P(£„) < Aj(f„y/ViKi for all 1 < j < n, then if 5 = byiAnjal) 
for n large enough we have that Aj(r„) > 6, almost sure, for all j, in particular 
for Ai(r„) = ||r„||. Then since |r„| = 1 we have that there is a constant Li > 
such that for n large enough ||r,„|| < Li. 

By ( IA.34P and Lemma [T3] to prove flA.30p it would be enough to show that 
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for any cr > there exist L2 and r/ > such that 

hm inf lypj^li^ll!^ I >6 + r/ a.s.. (A.35) 



n — s-oo ||B||>L2 n ^ — ' V a 
i=l 



By the Lebesgue dominated convergence Theorem, it is easy to show that for 
any cr > 

By (A2), there exist > 0, 7 > and a finite number of sets Ci, 62, . . . , Cs 
included in W^'^ such that 

IJCi D e = {B G M^^" : IIBII2 = 1} (A.37) 



i=l 



and 



Pgo( inf ||B'x|| > ^) > 6 + 7. (A.38) 



By flA.361) we can find Mi and 77 > such that 



i^ + l)EFApi\^-%^\ \>h + 2r^. (A.39) 



1 ^ 



Theb by fOOSj) and fOOOj) we have 



|y|| - Ml 



E inf /(||B'x|| > v?)pi , > 6 + 2r/. (A.40) 



Bee. 



r 1/2 



Let M2 be such that 

PFo(||y||>M2)<r/, (A.41) 
take M = max{Mi, M2} and L2 = M/ip, then by f[OT]) and f[07l) we have 
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1^ JM^\ ^_±,f\\l^ 

1 fM^") j(||B'x|| > v.)/(||y.|| < L,^) 



> 

IIB 



> 

i<i< 



Finally, using the Law of Large Numbers, (IA.40p and f lA.4ip we get flA.35p and 
this proves flA.30p . ■ 



Lemma 17. Let g : M'' x (M™^" x M'"^*) — > R continuous and let Q be a 
probability distribution on such that for some 6 > we have 



Eq I sup |g(z,(A,V))| I < oo, 

(A,V)-(Ao,Vo)|||<<5 



where \\\ ■ \\\ is the norm defined in flA.Sp . Let (A„, V„) be a sequence of estimates 

inW^^"^ X R''^* such that limn ^oo(A„,V„) = (Ao,Vo) a.s.. Then z/zi,...,z„, 

are i.i.d. random variables in R^ with distribution Q, we have 

n 

lim (l/n) V g(zi, (A„, V„)) = ^og(z, (Aq, Vq)) a.s.. 

n — ^oo ' ' 

i=l 

Proof: To prove the Lemma it suffices to show that for any e > there exists 
?7 > such that 



lim„^oo sup {l/n) V g(z,, (A, V)) < ^Qg(z, (Aq, Vq)) +e (A.42) 

|||(A,V)-(Ao,Vo)|||<^;? .^1 
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and 



lim,^^ inf (l/n)^g(z„(A,V))>i?Qg(z,(Ao,Vo))-e (A.43) 

A,V)- Ao.Vo <»7 ^ 
1=1 

By the Lebesgue dominated convergence Theorem we can take < 77 < 5 such 
that 

E{ sup g(z, (A, V))) < EQg(z, (Ao, Vo)) + e. 

|||(A,V)-(Ao,Vo)|||<r, 

Then using the Law of Large Numbers we obtain 



hm-V sup g(zi,(A,V)) = E( sup g(z,(A,V))) 

n^oo n |||(A,V)-(Ao,Vo)|||<r, |||(A,V)-(Ao,Vo)|||<r, 



and get (lA.42p . A similar procedure is performed to prove (]A.43p . 



Proof of Theorem [5 Consider 

e(5, Li, L2) = {(B, r) e X §, : 5 < ||r|| < Li, |r| = l and 1IBII2 < L2}, 
61(5) = {(B,r) G e(5,Li,L2) : IIBII2 > e} and 

62(5) = {(B,r) e e(5,Li,L2) : ||r -1,11 > e}. 

According to the Lemmas [T^ and [TBI and 02.61) . it would be enough to show that 
given £1 > 0, £2 > and Li and L2 arbitrarily large, there exist 7 > and 
cTi > (To such that 

1^ /rfi(B,r)\ ^ 
lim„^^ mf - > , Pi > EfqPi + 7 a.s., 

(A.44) 

1^ fdi{B,T)\ ^ /(u'u)i/2\ 

(B,r)ee2(£2) n \ ai J \ uq J 

(A.45) 



49 



and 



,^L±J^J^].E,.Ji^) a... ,A.46) 



By Lemma W2\ we have 



^ ,'d(B,r)\ ^ /(u'u)i/2\ 

Ep,{^]>Ep,(^] (A.47) 



for all B e RP^^ and T G with |r| = 1 such that T ^ I,. 

By Lemmaini flA.47p and the Lebesgue dominated convergence Theorem, using 
a standard compactness argument we can find ai > (Jq, 7 > and a finite number 
of sets, Ci, . . . , Cs, such that 

Eh, M Pi -^-^ > Ep„p, ^ + 7 A.48 

(B,r)ee, V ^1 / \ (^0 J 



and 



Uej^ei(£i). (A.49) 



By f[09l) we have 

'd,{B,T) 



lim inf — ^^pi f - 

i^oo (B,r)Gei(ei) n ^ V 



0-1 



> inf lim — inf pi ( 

i<j<s n — !>oo n ^-^ (B,r)eej V 



rf,.(B,r) 

0-1 



Then by (IA.48P and the Law of Large Numbers we get flA.44p . flA.45|) is proved 



similarly to (IA.44P and (IA.46P is a consequence of Lemma [T71 ■ 

Next we will give some definitions and lemmas that will be necessary to prove 
the asymptotic normality of MM-estimates B„. 



50 



Definition 5. Let ^ be a class of real-valued functions on a set X. An envelope 
for 5 is any function F such that |/| < -F for all f in ^. 

If yU is a measure on X for which F is integrable, it is natural to think of ^ as 
a subset of the space of all yU.- integrable functions. This space is equipped 

with a distance defined by the norm. Then the closed ball with center /q 

and radius R consists of all / in for which j \ f — fo\dfi < R. 

Definition 6. The class ^ is Euclidean for the envelope F if there exist positive 
constants a and r with the following property: if < s < 1 and if fi is any 
measure for which J Fdfi < oo, then there are functions /i, . . . , in^ such that 

(i) m < ae~^ , 

(ii) 5 is covered by the union of the closed balls with radius e J Fdfi and centers 

fl-i---i fm- 

In order to prove the following lemma we need Lemma 2.13 from Pakes and 
Pollard [18]. This is stated below: 

Lemma 18. Let 5 = {f{-,^) : ^ E Q} be a class of functions on X indexed by a 
bounded subset G ofM.'^. If there exists an a > and a nonnegative function ip{-) 
such that 

\f{x,0-f{x,C)\<'f{x)U-Cr for xeX and ^,C^G, 

then ^ is Euclidean for the envelope |/(-,^o)| + Rf{')> where is an arbitrary 
point of G and R = (2\/dsupg ||^ — ^o||)°- 

The proof of Lemma [TS] can be found in [IB] . 
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Lemma 19. If (A4), (A5) and (A6) hold, then there exists a function 0{^), that 
to each C, in x vec{Sq) assigns a pair (B, S) in M.'^^'p x Sg, and a bounded 
subset e o/M^P X vec{Sq), such that (Bq, (Tq^o) G for which each of the 

classes of functions 

^kj = {<P,,{z;e{0):^ee}, (A.50) 

where 0fcj(z; 6) = W {d(B, S)) {y^ — h'j.x.)xj and is the kth column vector of 
the matrix B, is Euclidean for certain envelope Fkj with EhoF^j < oo. 

Proof: For each ^ in M.'^^ x vec(§g) there exists a unique pair (B, S) in W^^^ x Sq 
such that ^ = (vec(B)', vec(S"^/^)'), then define the function 6{-) as follows: 
6>((vec(B)',vec(S-V2)/)) = (B, S) . 

Let e > and 6 = 2||(Tg'^SQ ^^^||, considering the norm defined in (lA.Sp . we 
denote by to the ball of radius e and center (Bq, (TqEq), then define 

e = ({(B, S) G X : (B, S) e and HS-^/^H < 6}) . 

Let ^ and ^* be any two elements of C such that 6{^) = (B, S) and ^(.C*) = 
(B*,S*), by the Mean Value Theorem there is a value c between (i(B, S) and 
ci(B*,S*) such that 

|W^((i(B*,S*))-iy(rf(B,S))| = |Vr'(c)||rf(B,S) -(i(B*,S*)|. (A.51) 

Since W and its derivative are continuous and with compact support there exists 
a constant M such that < M and |l^'('u)| < M for all u, using this and 
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flA.Sip we have 



|</),,(z; 6(0) - <^fc,(z; 6(0)1 < \W'{c)\\d{B, S) - d{B\ i:*)\\{y, - hl'^)x,\ 

+ |iyKB,S))||brxx,-Kxx,| 

< M{|rf(B,S*)-rf(B*,S)|(|y,| + ||b*||||x||) 
+ ||B*'-B'||2||x||}|x,| 

< M\x,\ {M(B,S*) -rf(B*,S)| (\yk\+eM) 

+iie-riiiix|i}. 



Applying inequahties of matrix norms we have 



\d{B, S) - d{B*, S*)| < (y _ g/^) _ ^*-i/2 _ 3*/^^ II 

lally II + ||S~i/2B'x - S*~^/2B*'x|| 
bllyll + ||S-i/2||||B'-B*'||2||x|| 
|2||B*||2||x|| 

bllyll +(5||B'-B*'||2||x|| 
Ih^llxll 

< {(e + 5)||x|| + ||y||}||(vec(B-B*)',vec(S"V2_s-i/2y)|| 
= {(^ + 5)l|x|| + ||y||}||e-ri|. 



< 


||S" 


1/2 _ 




-1/2 


< 


||S" 


1/2 _ 




-1/2 


+ 


||S* 


-1/2 _ 


- 


-1/2 


< 


||S^ 


1/2 _ 




-1/2 


+ 


||S^ 


1/2 _ 




-1/2 



Then, if we define 



iPkjiz) =M{((e + (5)||x|| + ||y||)(|i/feXj-| +£||x|||xj-|) + ||x|||xj-|} (A.52) 
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we have that 



Then we can apply Lemma [T51 and conclude that 'Ski is euclidean for the envelope 



with ^0 e e such that 6>(^o) = (Bo,(TgSo) and R = 2^/q{p + q) supg ||^-^o||- The 
proof of EhoF^j < follows immediately using that < M and expanding 

flA.521) as a sum of products, and bounding their respective means by means of 
(A6). ■ 
Before proving Theorem [6] we need to state Lemma 2.16 (page 1036) of Pakes 
and Pollard [181. 



Lemma 20. Let ^ be a Euclidean class with envelope F such that J F'^dP < oo. 
For each r] > and e > there exists a 6 > such that 



limsupP <( sup |z/„(/i) - i^nU2)\ >v\<£, 

[<5] 



where [5] represents the set of all pairs of functions in ^ with 



(/i - f^fdP < 6' 



andu^{f) = n-'/'Yl /(^^') " / 
servations sampled from the distribution P. 



, where Ci; C2, • • • , Cn are independent ob- 



The proof of Lemma [20] can be found in 
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Proof of Theorem [5t We denote On = (B„, S„) and = (Bq, (Tq^o)- 

Since we assumed that the distribution of errors u is eUiptical with density 
of the form (14. for any function h we have Eh^ {x jUih{u''SQ^u)) = 0. This 
imphes that 

Eho\w (rf(B,a2So)) (y - B'x)x' (A.53) 

vanishes at B = Bq. Then is a zero of the function $(^) = -Eho0(z; O)- 

By Lemma [19] there exists a bounded subset C and a function 0{C,) such that 
6q is an interior point of ^(C) and since 0„ — j- 0o a-S-, G ^(C) fo^^ large 
enough, i.e., 0fcj(z;0„,) and </)fcj(z;0o) belong to the Euclidean class ^^^fcj for n 
sufficiently large. By (A5), the functions 0fcj(z; On) and (pkji'Z] Oq) are in the class 
[6] of Lemma [20] for each 6 > and n sufficiently large. Hence, 

I 6I„)) - z/„(0fc,(-; 6>o))}| (A.54) 

in probability. Then since iyn{'Pkj{-', On)) — i^n{(pkj{-', Oq)) = op{l/y/n) for all k = 
1, . . . ,q and j = 1, . . . ,p and 0fcj(z; 0) corresponds to the element h = {j ~ l)q + k 
of the function 0, we conclude that 

z/„(0(-; 6>„)) - 6>o)) = op(l/v^). (A.55) 

Since 9$/9vec(B')' is continuous in ^o, we have that 

$(B, S) = $(Bo, S) + (^^(Bo, S)) vec(B' - B[,) + r(0)vec(B' - B',) 

(A.56) 

where r{0) — when — )■ ^o- 
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Using a suitable change of variables, for all S G we have that 



(Ex,) [ I UkW (u'S-^u) /*(u'So-iu)du 

'{u:«fe>0} 



{u:«fc<0} 



(Exj) / UkW (u'S-^u) /*(u'So-iu)rfu 

V-' {u:-Ufc>0} 

/ UkW (u'S-^u) /*(u'So ^u)c/u') = 0. 



Since this holds for all /c = 1, . . . , g and j = 1, . . . , p so that 



<l>(Bo,S) = 



(A.57) 



for all S e Sg. 



By f l2.10p . the pair 0„ = (B„, S„) is a zero of the function (1/n) XliLi 
Using this, after doing some simple operations of sum and subtraction and using 
(lA.SSp . we have 



n ^ n 

= (l/n)^</.(z„6>„) = S^„(/.(z,6>0+ -Y,<^^i^^^Go)-EHJ{z,0o 

1=1 L «=i 

n ^ 

- V [0(z„ 0„) - </)(z„ 0o)] - Eh, [0(z, 0„) - 0(z, 0o)] > 

^tr J 

1 " 

- 0(Zi, ^o) - Eho(P{2, Oq) 



E/^o0(z, On) 



1=1 



+ K(0(-;0„,))-i/„(0(-;0o))] 



Eho4>{z, On) 



1 " 

- 0(Zi, 0o) - ^Ho0(z, ^o) 
n — ^ 

1=1 



op(l/v/n). 



Since EHo(f>{z, On) is equal to $(B„, S„), we can solve for EHo(f>{z, On) in the 
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above equation and replace it in the expansion flA.56p for $(B,„,S,„), together 
with f lA.57|) we obtain the result 



= (^^(Bo,Sj)vec(B:,-B[,)+r(0>ec(B;-B'o) 



1 " 
n ^-^ 

i=l 



op(l/Vn). 



Since 9$/9vec(B')' is continuous in and as r(0„) = op(l), this reduces to 



= (A + op(l))vec(B;-B[,; 



1 

i=l 



Op{l/y/n). 

{A.5t 



According to the Central Limit Theorem 



1 " 

- </)(zi, ^o) - Eho<P{z, Oq 
n ^-^ 



Op(l/v^), 



and since A is nonsingular, from flA.SSp we get that vec(B^ — Bq) = Op(\j ^Jn). 
Then (1A.58P can be rewritten as 



= AvecfB;, - B[,) + 



n 



2^' 



+ Op(l/v^). 



As we saw at the beginning of the proof, is a zero of ^{0) = Eho(P{z] 6), 
and therefore 

1 " 

v^vec(B:, - B',) = ^o) + op(l). 

* 1=1 

Since </)fcj(z,0o) has finite mean and covariance for each k = l,...,q and 
j = 1, . . . ,p, the Theorem is proved after applying the Central Limit Theorem. ■ 
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Proof of Proposition [2 Consider first tlie case Sq = Ig- Tlie matrix A defined 



in (16.21) can also be expressed as 

dEn.W, {d{B, vec((y - B'x)x') 



A 



^ec(B')' 



(Bo,ao%). 



Since Wi is differentiable with bounded derivative we can differentiating inside 
the expectation. We can now proceed analogously to the proof of (IA.28P and we 
have that 



EF,Wi{\\nr/ai) ||u 

9^0 



EfoW 



u 



Using the same arguments as before and W{u) = ipi{u)/u, we obtain 



M 



E 



E 



Fo 



w 



^1 



iiuiiy 


' lluf 






iiuim 









Since (.EgoXx' ® \)~^ = (-^Go^x')"^ (g) Ig, the Proposition is proved for the 
case So = Iq. 

For the general case, let R a matrix such that Sq = RR' and consider the 
following transformation y* = R^V- Then Bq = BqR'"^ and y* = Bq'x + u*, 
with u* = R~^u. Observe that the distribution of u* is given by the density (14. ip 
with So = Iq and 

yec{B'o) = vec(RB*') = (Ip ® R)vec(B*'), 
and therefore, by the affine-equivariance of the MM-estimates, (16. 4p follows. ■ 
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Proof of Theorem [8t We denote the weight Uik = W (^di(B^''\ S^^^) j by Ui for 
each 1 < i < n and fl^''^ = (B^^'^S^'')) for each k > 1. Then, since W{u) is 
nonincreasing in |m| if and only if pi is concave (see page 326 of Maronna et al. 



[16]), we have 



1=1 i=l 

< k(B^'+'\f(*^+i)) -ci2(B(^),f(^))], (A.59) 

where f = s('=+^)/|S('=+^)|^/'? and f ('^^ = S('=)/|S('=)|^/'?. 

Recall that for any positive definite matrix A, the matrix B'^'^'^^^ minimizes 

n 

1=1 

Then 

n n 
i=l i=l 

and therefore the sum on the right side of (IA.59P is not greater than 



J2^dU^^'-''\r^'^'^) (A.60) 



1=1 



Since 



i=l 

we have that r('^+^) is the sample covariance matrix of the weighted residuals 
y/uJiUi(B^'''^^^) normalized to unit determinant, which minimizes the sum of squared 
Mahalanobis norms of weighted residuals y/uJiU^(B^''^^^) among the matrices with 
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determinant one, i.e., for any positive definite matrix V with |V| = 1 

i=l i=l 

Tlien, since \ f^''+^^\ = |f C^)] = 1, we liave that ( 1X60]) is < 0. ■ 
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